Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Shimokawabe, Takashi*; Endo, Toshio*; Onodera, Naoyuki; Aoki, Takayuki*
Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet), p.525 - 529, 2017/09
Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.
Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*
no journal, ,
A real-time simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Since a lot of tall buildings and complex structures make the air flow turbulent in urban cities, large-scale CFD simulations are needed. To this end, a CFD code based on a Lattice Boltzmann Method (LBM) with a block-based Adaptive Mesh Refinement (AMR) method is developed. As the conventional LBM based on a single relaxation time collision operator often becomes numerically unstable at high Reynolds number, we apply a state-of-the-art cumulant collision operator. The code is developed on a GPU cluster at JAEA. By using new functions in CUDA8.0, the GPU kernel functions are tuned to achieve high performance on the latest Pascal GPU architecture. By introducing a temporal blocking technique, we achieve a high performance of 488 MLUPS per a GPU, and the number of the MPI communications is significantly reduced.
Onodera, Naoyuki
no journal, ,
The SPEEDI and its world version (WSPEEDI) were developed to predict the off-site diffusion behavior of radioactive substances covering wide areas at ~100km scale based on a mesoscale metrological model. In this work, we apply two new ingredients, GPUs and an adaptive mesh refinement (AMR) method to the lattice Boltzmann method (LBM). In this report, we confirmed the good scalability on the GPU-rich supercomputer, and our code can reproduce the wind tunnel experiment. We conclude that the present LBM is one of most promising approaches to realize a real-time simulation.
Onodera, Naoyuki
no journal, ,
The simulation for dissipation of radioactive substances attract high social interest, and it is required to satisfy both the rapidity and the accuracy. To perform a real-time simulation with high resolution mesh for the scale of human living area such as alleyways and buildings, it is required to develop simulation schemes which can fully utilize high computational performance. In this study, we introduced a nudging-based data assimilation method into the lattice Boltzmann method (LBM), so that we can performe plume dissipation simulations for urban area.